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Abstract-Forecasting always plays an important role in 
business, technology and many others and it helps 
organizations to increase profits, reduce lost sales and more 
efficient production planning. A parallel algorithm for 
forecasting reported recently on OTIS-Mesh[9]. This parallel 

algorithm requires 5(\''m - 1) electronic steps and 4 optical 
steps. In this paper we present an improved parallel algorithm 
for time series short term forecasting using OTIS-Mesh. This 
parallel algorithm requires 5(-l) electronic steps and 1 optical 
step using same number of I/O ports as considered in [9] and 
shown to be an improvement over the parallel algorithm for 
time series forecasting using OTIS-Mesh [9]. 

Index Terms- Time Series Forecasting, Parallel Algorithm, 
OTIS-Mesh. 

I. INTRODUCTION 

Many businesses, Organizations, get benefits through 
forecasting in terms of profit increment, reduce lost sales and 
also it helps to make production planning more efficient. 
Forecasting plays an important role in many areas such as 
weather forecasting, flood forecasting etc. Many researchers 
implemented forecasting on many different interconnection 
networks in parallel. Forecasting can also be map on OTIS 
network. Optical Transpose Interconnection System (OTIS) 
[1],[2] is basically a hybrid architecture which is benefits from 
optical and electronic connections. Optical connection is used 
to connect the processors when the distance between the 
processors exceeds the few millimetres (in other package) 
and electronic connections are used to connect the close 
processors (within the same physical package). Several 
models exploit the idea of optical and electronic connections. 
In an OTIS-Mesh, n 2 processors are divided into n groups 
where processors in each group follow x 2D mesh layout. 
According to the OTIS rule , G" 1 group is connected to the P th 
processor and P th group is connected to the G" 1 processor. 
The Pattern of OTIS can be varied according to the 
interconnection among the processor in the group. The 
topology of OTIS-Mesh shown in figure 1. 
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Figure 1. OTIS-Mesh network 
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II. RELATED WORKS 

Many Researchers has developed parallel algorithms for 
short term time series forecasting. Jana and sinha presented 
two parallel algorithms for forecasting implemented on linear 
array and complete binary tree modelf 14] and require 

m+ 1 steps and (m — n+ Ti + log : n. steps respectively and 

in extended case[14] it requires ~ t m _ n + lJ + V ~ 1 and 

- (m - n + 2)+ i g ; p steps respectively on ST array and ST 

tree. Both the algorithm based on weighted moving average 
techniques. Sudhanshu and jana presented parallel algorithm 
for forecasting based on OTIS-Mesh Network[9]. This parallel 

algorithm requires 5(vn-l) electronic moves and 4 optical 
moves[9]. In this paper we present an improved parallel 
algorithm for forecasting based on OTIS- Mesh network. This 
parallel based on weighted moving average technique and 

requires 5(v?i-l) electronic moves and 1 optical move. This 
parallel algorithm can be compared to parallel algorithm for 
forecasting as considered in [9]. 

EI. FORECASTING MODELS 

Forecasting models can be divided in to Qualitative and 
Quantitative forecasting models. We discuss here the time 
series models of quantitative forecasting model. Among 
different quantitative forecasting models available for 
successful implementation of decision making systems, time 
series models are very popular. In these models, given a set 
of past observations, say d^d,,...^ , the problem is to 
estimate d(m + t) through extrapolation, where t(called the 
lead time) is a small positive integer and usually set to 1 . The 
observed data values usually show different patterns, such 
as constant process, cyclical and linear trend as shown in 
Figure 2. Several models are available for time series 
forecasting. However, a particular model maybe effective for 
a specific pattern of the data, e.g. simple moving average is 
very suitable when the data exhibits a constant process. 
Weighted moving average is a well known time series model 
for short term forecasting which is suitable when the data 
exhibits a cyclical pattern around a constant trend. Exponential 
weighted moving average is more widely accepted technique 
method for short term forecasting than the (simple) weighted 
moving average. However, our motivation to parallelize 
weighted moving average with the fact that both the 
exponential weighted moving average and the simple moving 
average (MA) are the special cases of the weighted moving 
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average as will be discussed in section IV. Moreover, in order 
to find the optimum value of the window size, it involves 
O(m) iterations where each iteration requires O(n) time for 
calculating (m - n + 1) weighted moving averages for a 
window size n and m data size. In this paper, we present an 
improved parallel algorithm for short term forecasting which 
is based on weighted moving average of time series model 
and mapped on OTIS-Mesh. This algorithm is shown to be 
an improvement over the algorithm as considered in [9] and 

requires 5(v?i"l) electronic moves + 1 OTIS move using same 
number of I/O ports for m size data set and n window size 
using n 2 processors. 



Figure 2(a). Constant data pattern 



Figure 2(b). Trend data pattern 
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Figure 2(c). Cyclic data pattern 

The rest of the paper is organized as follows. Section IV 
describes the methodology for time series forecasting using 
weighted moving average with its special cases. In section V, 
we present our proposed parallel algorithm. We also discuss 
scalability issue for forecasting. 

IV.MEHODOLOGY 

V\fe describe here the methodology for forecasting using 
wei ghted movi ng average as gi ven i n [ 14] . I n thi s method, fa 

a set Of n data values d, d ,,..., d , and a set of positive 

t' t+l' ' t-n+l " 

weights w , w„ . . ., w , we calculate their weighted moving 
average at time t by the following formula 



W M iti = 



W m d t +W 11 - 1 d r - 1 +lV,|._;Cl r _;+.....+lV 1 Cl r - T! - 1 



(1) 



Where w w >w > 

n n-1 n-2 



"n + ™*.-l+ "11-2 +.-..+"■! 

. . ,>w >0. We then use W M (t) to estimate 
the forecast value d(t + x) at time t + x, i.e., (t+x) = W M (t) . The 
quality of the forecast depends on the selection of the window 
size (n). Therefore, in order to find the optimum value of n, we 
calculate m - n + 1 weighted averages for a specific value of 
n by sliding the window over the data values and the 
corresponding mean square error (MSE) is also calculated 
using 



M5ff = ST=« + , 



r. "IS 

k- d t ] 

Tji-iu— r+ 1 



(2) 



We then vary the window size (n) to obtain the corresponding 
MSE with the newly calculated weighted moving averages. 
The same process is repeated for n = 1,2, 3,..,m. The value of 
n for which MSE is least is chosen for forecasting. 

Special Cases 

A. Simple Moving Average: 

In this method, equal importance is given to each data 
value. Hence we assign equal weight 1/n to all the data values 
and obtain the following moving average. 
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As we have discussed in section III, this method is best 
when the data pattern shows a constant process. 

B. Exponential Moving Average: 

In this method, the more recent observations are given a 
larger weight to face smaller error and thus the weights are 
assigned in decreasing order. The formula for exponential 
moving average is as follows 

E : ' ! (t) = ad t +- h(1 - a)d t _ 1 + a(l - a) : d f _ : + 



,+ ff (l- H )"d f _ f!+1 . 



(4) 



where weight w. = a(l -a) 111 l<i<n. l<o<n. This method 
is suitable for a cyclical pattern around a constant trend and 
is widely accepted specially for business environment. 
However, the method suffers from the proper selection of the 
value of the parameter and there is no easy method to do it. 

V. ALGORITHM 

Assume x = 1. Then (m - n + 1) weighted moving aver- 
ages are obtained from equation (1) for a given window size 
n along with their error term as follows 



W M (n) = 



Wi^iH- iv a d 2 +lV 3 -d. 3 +j.^.+ W-adji 



-\ 



IV* (n + 1} 



W M tn + 2) = 



lV 1 + W3+W 2+r .. T+WT1 



(5) 



> 



W M (ir?^\ — TVl ^^~ TC ~ 1+TV= ^ m ~ TC ~ :+TV3 ^' ri ~ TC ~ 3+ '" +1VTa '^ m 
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Ein + 2) = d n+1 - W M (n±l) 

E(n + 3) = d n+2 - \V*(n f2) } (6) 

a 

E(m} = d m -W M (m-l) j 
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For a different value of n say n , 1 3 <m : we require to 
compute different set of (m - n + 1 ) weighted moving averages 
(as given above) for a maximum of m iterations. However, our 
target is to parallelize the above computation for a single 
iteration so that the overall time complexity can be 
significantly reduced. The basic idea is as follows. We initially 
feed the data values and the weight vector through the 
boundary processors. Then using suitable electronic and 
OTIS moves, they are stored in the D and W registers 
respectively. Next we calculate their products for each 
processor in parallel. The products are then used to form the 
local sum in each group which are finally accumulated using 
suitable electronic and OTIS moves to produce weighted 
moving averages. 

VI. PARALLEL ALGORITHM 

Step 1. /*Data Input*/ 

l.lFeed the data values d.'s, l^^m to the boundary 
processors in the 1 st column position of each group G , 
l^KjY^Vn as shown in Figure 3. 

1.2 Feed the weights w.'s, l<j<n to the boundary processors 
in the 1 st processor P u of each group G x , j <jj v<V?i as 
shown in figure 3. 
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Figure 3. Feeding of data and weights 

Step 2. /* Data distribution into D-registers */ 

Shift the data values row- wise to store them in D-registers in 

a pipeline fashion. 

Step 3. /* Broadcast of weights */ 

Perform column-wise broadcast on the weights fed in step 

1.2 in parallel and store them in W register. 
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Step 4. /^Broadcast of 'weights*/ 
Perform row- wise broadcast on the weights stored in step 
3 of each group in parallel and store them in W register. 
Illustration 1: Content of W registers after step 4 shown in 
figure 5. 
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Figure 5. Content of W registers after step 4 

Step 5. V Processors do in parallel 

Perform OTIS move on the content of W registers in each 

group. 

Illustration 2: Content of W registers after step 5 shown in 

figure 6. 
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Figure 6. Content of W registers after step 5 

Step 6. w Processors do in parallel 

Form the products with the contents of D registers and W 
registers and store it in C-register. 
Step 7. /^Perform Summation*/ 
h groups do steps 7.1 and 7.2 in parallel 

7.1 Sum up with the content of C register column wise and 
store it in first row processors in each group. 

7.2 Sum up with the content of W registers column wise and 
store it in first row processors in each group. 

Step 8. /*Parallel summation row wise*/ 
w groups do steps 8.1 and 8.2 in parallel 

8. 1 Sum up with the content of C registers in first row of each 
group and store it in C registers of first processor P in each 
group. 

8.2 Sum up with the content of W registers in first row of 
each group and store it in first processor P in each group. 
Illustration 3: We store the data , weights, and products in 
D. j , W. j and C. j registers l<ij<n respectively where i indicates 
register number in group and j indicates group number. Group 
numbers are organized in row major order. C registers and W 
registers of processor P after step 8 shown in figure 7. 
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Step 9. Divide the content of C register and W register and 
store it in R registers of first processor of each group G 

Remark 1: The final results emerge from the R- register of 
first processor P n in each Group (G x ,P U ) l<x.\<<Jn The 
result after step 8 shown in table I. 

RESULTS 

We describe here the time complexity required to map the 
parallel algorithm on OTIS-Mesh on n 2 processors. 

Time Complexity: Steps 2, 3, 4, 7, 8 requires (Vrt-l) steps 
each. Step 5 requires 1 OTIS move. Rest of the steps requires 
constant time. Therefore the parallel algorithm requires 
5(v'n-l) electronic moves and 1 OTIS move. 



Scalability: Now we consider any arbitrary size of the window 
to map the above algorithm on a ^ x ^ OTIS-mesh. 
In other words, we consider the case when the 
window size is independent of the number of processors. For 
the sake of simplicity and without any loss of generality, let 
us assume it to be kn. Note that in this case, the size of the 
data set will be 2kn " 1. Accordingly the data set is also 
partition into k subsets: {d p d 2 ,..., d }, {d 2 , d ,...,d }, 
...,{d„ , d„ ,,..., d„ , } . Given a subset of the data, its 

,l 2kn-ir 2kn-n+l 2kn-l ' ' 

corresponding weights, then we can partition the weight set 
intok subsets: { w,, w„. 



, w 



, w„ 



w, 



(k-l)n+l' 



w. „ „...,w, I. weight subset is fed to the v>i x OTIS-mesh. 

(k-l)n+2' kn ' ° s 

We then run the above algorithm (Parallel Algorithm) and 
store the result temporarily. Next we input another data subset 
along with the corresponding weight subset, execute Parallel 
Algorithm and update the current result with the previously 
calculated partial result. 

This process is repeated k times to yield the final result. 
This is obvious to note that this version of the algorithm 
requires 5k( "1) electronic moves + Ik OTIS moves, which is 
k times more than time complexity of Parallel Algorithm. 



TableI. 
Content of c registers and w registers of processors p! 1 of each group after step 8 



Ci — d L w L ■+■ 4:wj! ■+■ djwj + djwj + 


Ci — diw L -+■ djwi ■+ iLiwj-h tkwj -+■ d<w= 


Cl — djW L + djWn + d^TTj + dsTTj + 


d^w= ■+■ djTTd ■+ d:w: ■+ dawa ■+■ dawg 


■+■ d:w<i ■+■ daw: ■+■ d^wa + dijwg 


d:w= -r daWd ■+ daw:+ dijTTa ■+ dLLTT"s 


"\Vl = T| -1- Wa ■+ W} 4- V« 4-Ws: ■»-'■#+ W7 + 


"\Vl — W L ■+ F: -1- Wj ■+ W-i •+ W£ ■+ W<j ■+ W; ■+ 


"Wl — W L ■+ TT2 -+ TTJ + TT-i + TT£ ■+ TTd + TT; + 


Wa -•■ Wg 


TTa -•■ Wg 


Wa ■*■ Wg 


Cl = [L|ITl ■+ d^Wj ■+ djWj ■+ d:VT4 ■+ 
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Ci = djw L ■+ d;wj + da"Wj -+■ d^TT^ + 


daw= ■+■ daiTd ■+ di..]W: ■+ dnWa ■+■ di^wg 


■+■ dijTTd + dn.TTr-1- di^wa ■+ dLjwg 


dioTT= ■+ dnWd-i- di iw: ■+ dLjwa ■+■ dLjwg 


"Wl = W| + Wj + Wj + w+ + Vs-t-Vi ■+■ W7 + 


"Wl* = T| -I" W2 ■+ Wj ■+ W^ "I- Wf ■+ W(j + W7 + 


■\Yl = W| + W2 + W3+ W* +MS" 1 "*! + "^7 + 


Wa "*" Wg 


Wg "*" Wg 


Wa "+" Wg 


Ci ' = d:w L ■+■ dawj ■+ dswj ■+ d L jw^ + 


Cl = daTT L + dflWj "I" d L jWj "I" duW^ ■+■ 


Ci = ddW L ■+ d L jwj + d L LW"j + d L jw^ ■+ 


diLTT"=-t- dLjWd"'" d|.jW; -r d|^Wa "*" di£Wfl 
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W 1 — m * * m f * M a + ita + w= + Wd ■+ tt- + 
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d„ d,,..., d }, {d„ d„...,d ,}, 

1' 2' ' n ' ' l 2' 3' n+1 J ' 

. Given a subset of the data, its 



window size is independent of the number of processors. For 
the sake of simplicity and without any loss of generality, let 
us assume it to be kn. Note that in this case, the size of the 
data set will be 2kn — 1 . Accordingly the data set is also 
partition into k subsets: 

■■■'I a 2kn-n' ^Jkn-n+l' "•' ^Zkn-I 

corresponding weights, then we can partition the weight set 
intok subsets: {w,,w,, ...,w }, {w ,, w,,..., w, }..., {w. ,, ,, 

L 1' 2' n J ' l n+1' 2' 2n ' ' l QL-lp+V 

W <k Dn+2' ■ ■ ■' w k„}- we ig nt subset is fed to the vn xv^ OTIS- 
mesh. We then run the above algorithm (Parallel Algorithm) 
and store the result temporarily. Next we input another data 
subset along with the corresponding weight subset, execute 
Parallel Algorithm and update the current result with the 
previously calculated partial result. 
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This process is repeated k times to yield the final result. This 
is obvious to note that this version of the algorithm requires 
5k(V« -1) electronic moves + Ik OTIS moves, which is 
K; times more than time complexity of Parallel Algorithm. 

CONCLUSIONS 

In this paper, we have presented an improved parallel 
algorithm for short term forecasting using weighted moving 
average technique. The algorithm is mapped on n 2 processor 
OTIS-Mesh. We have shown that it requires 5(Vii-i) electronic 
moves + 1 OTIS move. This Parallel algorithm shown to be 
an improvement over the Parallel algorithm using same 
number of I/O ports as considered in [9]. The algorithm 
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is also shown to be scalable. 

COMPARATIVE RESULT ANALYSIS 

In Table II we compare the time complexity in terms of 
electronic moves and optical moves required by the parallel 
algorithms mapped on OTIS-Mesh network. In our proposed 
algorithm, we require same time complexity in terms of 
electronic moves as compared to [9] but we require only 1 
optical move. 

Table II. 

Comparison of otis-mesh based parallel algorithms 



Parallel Algorithm 


Electronic Moves 


Optical Moves 


Sudhanshu and Jjei[5 


= (vn-n 


4 


Prop o s ?d Algorithm 


5(Vn-l) 


1 



FUTURE WORKS 

For implementation of forecasting in parallel architecture, 
proper shifting of data should be done properly. So we should 
try to explore the parallel architecture for proper shifting of 
data. We can also exploit the properties of OTIS based 
networks. Therefore we should try to map time series 
forecasting on other OTIS based networks such as OTIS- 
Mesh Of Trees, OTIS-Hypercube etc. 
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